Region-attentive multimodal neural machine translation

نویسندگان

چکیده

We propose a multimodal neural machine translation (MNMT) method with semantic image regions called region-attentive (RA-NMT). Existing studies on MNMT have mainly focused employing global visual features or equally sized grid local extracted by convolutional networks (CNNs) to improve performance. However, they neglect the effect of information captured inside features. This study utilizes object detection for and integrates textual using two modality-dependent attention mechanisms. The proposed was implemented verified architectures (NMT): recurrent network (RNN) self-attention (SAN). Experimental results different language pairs Multi30k dataset show that our improves over baselines outperforms most state-of-the-art methods. Further analysis demonstrates can achieve better performance because its feature use.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Attention-based Multimodal Neural Machine Translation

We present a novel neural machine translation (NMT) architecture associating visual and textual features for translation tasks with multiple modalities. Transformed global and regional visual features are concatenated with text to form attendable sequences which are dissipated over parallel long short-term memory (LSTM) threads to assist the encoder generating a representation for attention-bas...

متن کامل

Self-Attentive Residual Decoder for Neural Machine Translation

Neural sequence-to-sequence networks with attention have achieved remarkable performance for machine translation. One of the reasons for their effectiveness is their ability to capture relevant source-side contextual information at each time-step prediction through an attention mechanism. However, the target-side context is solely based on the sequence model which, in practice, is prone to a re...

متن کامل

Doubly-Attentive Decoder for Multi-modal Neural Machine Translation

We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive decoder naturally incorporates spatial visual features obtained using pre-trained convolutional neural networks, bridging the gap between image description and translation. Our decoder learns to attend to source-language words and parts of an image independently by means of two separate attention mechanisms ...

متن کامل

Multimodal Attention for Neural Machine Translation

The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simult...

متن کامل

Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation

In state-of-the-art Neural Machine Translation, an attention mechanism is used during decoding to enhance the translation. At every step, the decoder uses this mechanism to focus on different parts of the source sentence to gather the most useful information before outputting its target word. Recently, the effectiveness of the attention mechanism has also been explored for multimodal tasks, whe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neurocomputing

سال: 2022

ISSN: ['0925-2312', '1872-8286']

DOI: https://doi.org/10.1016/j.neucom.2021.12.076